34 research outputs found

    Efficient Computation of Multiple Density-Based Clustering Hierarchies

    Full text link
    HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts in the sense that a small change in mpts typically leads to only a small or no change in the clustering structure, choosing a "good" mpts value can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters may reveal themselves at different values of mpts. To explore results for a range of mpts values, however, one has to run HDBSCAN* for each value in the range independently, which is computationally inefficient. In this paper, we propose an efficient approach to compute all HDBSCAN* hierarchies for a range of mpts values by replacing the graph used by HDBSCAN* with a much smaller graph that is guaranteed to contain the required information. An extensive experimental evaluation shows that with our approach one can obtain over one hundred hierarchies for the computational cost equivalent to running HDBSCAN* about 2 times.Comment: A short version of this paper appears at IEEE ICDM 2017. Corrected typos. Revised abstrac

    Identificação e controle de processos via desenvolvimentos em séries ortonormais. Parte B: controle preditivo

    Get PDF
    This paper presents an overview about predictive control schemes based on orthonormal basis function models. Different predictive control schemes based on such models are discussed, namely, linear controllers with terminal (stabilizing) constraints, robust controllers, and non-linear controllers. The discussions comprise a broad bibliographical survey on the subject as well as two case studies involving a simulated dynamic system and a real process.O presente artigo aborda o problema da seleção da estrutura de modelos em algoritmos de controle preditivo para sistemas monovariáveis. Neste sentido, apresenta a utilização de modelos com estrutura dinâmica desenvolvida através de bases de funções ortonormais, como as funções de Laguerre, Kautz ou funções ortonormais generalizadas. Os principais aspectos relacionados com esta classe de modelos no contexto de controladores preditivos lineares com restrições terminais, não lineares e robusto são discutidos e uma revisão bibliográfica é apresentada. O desempenho de malha fechada das estratégias analisadas é ilustrado através de dois casos de estudo envolvendo uma incubadora para recém nascidos e um processo simulado de polimerização isotérmica.322336Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq

    Identificação e controle de processos via desenvolvimentos em séries ortonormais. Parte A: identificação

    Get PDF
    In this paper, an overview about the identification of dynamic systems using orthonormal basis function models, such as those based on Laguerre and Kautz functions, is presented. The mathematical foundations of these models as well as their advantages and limitations are discussed within the contexts of linear, robust, and nonlinear identification. The discussions comprise a broad bibliographical survey on the subject and a comparative analysis involving some specific model realizations, namely, linear, Volterra, fuzzy, and neural models within the orthonormal basis function framework. Theoretical and practical issues regarding the identification of these models are also presented and illustrated by means of two case studies related to a polymerization process.O presente artigo apresenta uma visão geral do estado da arte na área de identificação de sistemas utilizando modelos dinâmicos com estrutura desenvolvida através de bases de funções ortonormais, como as funções de Laguerre, Kautz ou funções ortonormais generalizadas. Discute-se as vantagens e possíveis limitações desse tipo de estrutura bem como os fundamentos matemáticos dos modelos correspondentes nos contextos de identificação linear, linear com incertezas paramétricas (identificação robusta) e não linear, incluindo uma revisão bibliográfica abrangente sobre o tema. Diferentes realizações de modelos com funções de base ortonormal, a saber, modelos lineares, de Volterra, fuzzy e neurais, são detalhadas e discutidas comparativamente em termos de capacidade de representação, parcimônia, complexidade de projeto e interpretabilidade. Aspectos práticos da identificação desses modelos são também apresentados e ilustrados através de dois casos de estudo envolvendo um processo simulado de polimerização isotérmica.301321Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq

    Robust Expansion of Uncertain Volterra Kernels into Orthonormal Series

    Get PDF
    Abstract-This paper is concerned with the computation of uncertainty bounds for the expansion of uncertain Volterra models into an orthonormal basis of functions, such as the Laguerre or Kautz bases. This problem has already been addressed in the context of linear systems by means of an approach in which the uncertainty bounds of the expansion coefficients have been estimated from a structured set of impulse responses describing a linear uncertain process. This approach is extended here towards nonlinear Volterra models through the computation of the uncertainty bounds of the expansion coefficients from a structured set of uncertain Volterra kernels. The proposed formulation assures that the resulting model is able to represent all the original uncertainties with minimum intervals for the expansion coefficients. An example is presented to illustrate the effectiveness of the proposed formulation

    Guidelines for the use and interpretation of assays for monitoring autophagy (3rd edition)

    Get PDF
    In 2008 we published the first set of guidelines for standardizing research in autophagy. Since then, research on this topic has continued to accelerate, and many new scientists have entered the field. Our knowledge base and relevant new technologies have also been expanding. Accordingly, it is important to update these guidelines for monitoring autophagy in different organisms. Various reviews have described the range of assays that have been used for this purpose. Nevertheless, there continues to be confusion regarding acceptable methods to measure autophagy, especially in multicellular eukaryotes. For example, a key point that needs to be emphasized is that there is a difference between measurements that monitor the numbers or volume of autophagic elements (e.g., autophagosomes or autolysosomes) at any stage of the autophagic process versus those that measure fl ux through the autophagy pathway (i.e., the complete process including the amount and rate of cargo sequestered and degraded). In particular, a block in macroautophagy that results in autophagosome accumulation must be differentiated from stimuli that increase autophagic activity, defi ned as increased autophagy induction coupled with increased delivery to, and degradation within, lysosomes (inmost higher eukaryotes and some protists such as Dictyostelium ) or the vacuole (in plants and fungi). In other words, it is especially important that investigators new to the fi eld understand that the appearance of more autophagosomes does not necessarily equate with more autophagy. In fact, in many cases, autophagosomes accumulate because of a block in trafficking to lysosomes without a concomitant change in autophagosome biogenesis, whereas an increase in autolysosomes may reflect a reduction in degradative activity. It is worth emphasizing here that lysosomal digestion is a stage of autophagy and evaluating its competence is a crucial part of the evaluation of autophagic flux, or complete autophagy. Here, we present a set of guidelines for the selection and interpretation of methods for use by investigators who aim to examine macroautophagy and related processes, as well as for reviewers who need to provide realistic and reasonable critiques of papers that are focused on these processes. These guidelines are not meant to be a formulaic set of rules, because the appropriate assays depend in part on the question being asked and the system being used. In addition, we emphasize that no individual assay is guaranteed to be the most appropriate one in every situation, and we strongly recommend the use of multiple assays to monitor autophagy. Along these lines, because of the potential for pleiotropic effects due to blocking autophagy through genetic manipulation it is imperative to delete or knock down more than one autophagy-related gene. In addition, some individual Atg proteins, or groups of proteins, are involved in other cellular pathways so not all Atg proteins can be used as a specific marker for an autophagic process. In these guidelines, we consider these various methods of assessing autophagy and what information can, or cannot, be obtained from them. Finally, by discussing the merits and limits of particular autophagy assays, we hope to encourage technical innovation in the field

    A systematic comparative evaluation of biclustering techniques

    Get PDF
    Abstract Background Biclustering techniques are capable of simultaneously clustering rows and columns of a data matrix. These techniques became very popular for the analysis of gene expression data, since a gene can take part of multiple biological pathways which in turn can be active only under specific experimental conditions. Several biclustering algorithms have been developed in the past recent years. In order to provide guidance regarding their choice, a few comparative studies were conducted and reported in the literature. In these studies, however, the performances of the methods were evaluated through external measures that have more recently been shown to have undesirable properties. Furthermore, they considered a limited number of algorithms and datasets. Results We conducted a broader comparative study involving seventeen algorithms, which were run on three synthetic data collections and two real data collections with a more representative number of datasets. For the experiments with synthetic data, five different experimental scenarios were studied: different levels of noise, different numbers of implanted biclusters, different levels of symmetric bicluster overlap, different levels of asymmetric bicluster overlap and different bicluster sizes, for which the results were assessed with more suitable external measures. For the experiments with real datasets, the results were assessed by gene set enrichment and clustering accuracy. Conclusions We observed that each algorithm achieved satisfactory results in part of the biclustering tasks in which they were investigated. The choice of the best algorithm for some application thus depends on the task at hand and the types of patterns that one wants to detect

    Comparing Correlation Coefficients as Dissimilarity Measures for Cancer Classification in Gene Expression Data

    No full text
    Abstract. An important analysis performed in gene expression data is sample classification, e.g., the classification of different types or subtypes of cancer. Different classifiers have been employed for this challenging task, among which the k-Nearest Neighbors (kNN) classifier stands out for being at the same time very simple and highly flexible in terms of discriminatory power. Although the choice of a dissimilarity measure is essential to kNN, little effort has been undertaken to evaluate how this choice affects its performance in cancer classification. To this extent, we compare seven correlation coefficients for cancer classification using kNN. Our comparison suggests that a recently introduced correlation may perform better than commonly used measures. We also show that correlation coefficients rarely considered can provide competitive results when compared to widely used dissimilarity measures

    On the selection of appropriate distances for gene expression data clustering

    Get PDF
    Background: Clustering is crucial for gene expression data analysis. As an unsupervised exploratory procedure its results can help researchers to gain insights and formulate new hypothesis about biological data from microarrays. Given different settings of microarray experiments, clustering proves itself as a versatile exploratory tool. It can help to unveil new cancer subtypes or to identify groups of genes that respond similarly to a specific experimental condition. In order to obtain useful clustering results, however, different parameters of the clustering procedure must be properly tuned. Besides the selection of the clustering method itself, determining which distance is going to be employed between data objects is probably one of the most difficult decisions. Results and conclusions: We analyze how different distances and clustering methods interact regarding their ability to cluster gene expression, i.e., microarray data. We study 15 distances along with four common clustering methods from the literature on a total of 52 gene expression microarray datasets. Distances are evaluated on a number of different scenarios including clustering of cancer tissues and genes from short time-series expression data, the two main clustering applications in gene expression. Our results support that the selection of an appropriate distance depends on the scenario in hand. Moreover, in each scenario, given the very same clustering method, significant differences in quality may arise from the selection of distinct distance measures. In fact, the selection of an appropriate distance measure can make the difference between meaningful and poor clustering outcomes, even for a suitable clustering method

    Clustering of RNA-Seq samples: Comparison study on cancer data

    No full text
    RNA-Seq is becoming the standard technology for large-scale gene expression level measurements, as it offers a number of advantages over microarrays. Standards for RNA-Seq data analysis are, however, in its infancy when compared to those of microarrays. Clustering, which is essential for understanding gene expression data, has been widely investigated w.r.t. microarrays. In what concerns the clustering of RNA-Seq data, however, a number of questions remain open, resulting in a lack of guidelines to practitioners. Here we evaluate computational steps relevant for clustering cancer samples via an empirical analysis of 15 mRNA-seq datasets. Our evaluation considers strategies regarding expression estimates, number of genes after non-specific filtering and data transformations. We evaluate the performance of four clustering algorithms and twelve distance measures, which are commonly used for gene expression analysis. Results support that clustering cancer samples based on a gene quantification should be preferred. The use of non-specific filtering leading to a small number of features (1,000) presents, in general, superior results. Data should be log-transformed previously to cluster analysis. Regarding the choice of clustering algorithms, Average-Linkage and k-medoids provide, in general, superior recoveries. Although specific cases can benefit from a careful selection of a distance measure, Symmetric Rank-Magnitude correlation provides consistent and sound results in different scenario
    corecore